-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1875773: ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml: Add EC2 endpoint #11723
Bug 1875773: ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml: Add EC2 endpoint #11723
Conversation
@wking: This pull request references Bugzilla bug 1875773, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold No rehearsals, because I updated the live stacks. Let's see if this helps proxy jobs before landing it. |
@@ -173,6 +173,25 @@ Resources: | |||
Properties: | |||
SubnetId: !Ref PrivateSubnet3 | |||
RouteTableId: !Ref PrivateRouteTable3 | |||
EC2Endpoint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a TODO comment be added to remove the this VPCE when https://bugzilla.redhat.com/show_bug.cgi?id=1769223 is fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno. We will probably need to keep this until we have end-of-lifed all releases for which the machine-API needs the endpoint. But we may want to split into two flavors of VPC so we can test proxy-respecting machine-API while continuing to run CI on proxy-ignoring machine-API. I'm fine tracking that in Jira or a bug or whatever instead of in release comments.
e12db62
to
a639ef9
Compare
There are no rehearsals on this PR for us to make sure this change is correct. |
ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml.md
Show resolved
Hide resolved
OK I think we should get one big proxy PR going. Trevor do you want to pull in #11750 to this PR? |
This PR is just docs. I've already updated the CI stacks to match. And it should be just for EC2 access from the private subnet, so it seems fairly orthogonal to proxy-instance stuff. |
Ah. Confusing. OK then...it appears the change documented here isn't effective per |
…_yaml: Add EC2 endpoint The machine-API currently ignores the proxy configuration, although future machine-API might grow support for it [1]. That means CI jobs in the blackhole VPC die on i/o timeouts trying to reach https://ec2.${region}.amazonaws.com/ while provisioning compute machines, and the install subsequently dies because we fail to schedule monitoring, ingress, and other compute-hosted workloads [2]. This commit adds a VPC endpoint to allow EC2 access from inside the cluster [3]. It's similar to the existing S3 VPC endpoint, but: * It's an interface type, while S3 needs the older gateway type. This avoids: Endpoint type (Gateway) does not match available service types ([Interface]). (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameter; Request ID: ...; Proxy: null) while creating the stack. * There are no RouteTableIds, because the interface type does not support them. This avoids: Route table IDs are only supported for Gateway type VPC Endpoint. (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameter; Request ID: ...; Proxy: null) while creating the stack. * I've created a new security group allowing HTTPS connections to the endpoint, because SecurityGroupIds is required for interface endpoints [3]. I've also placed the network interfaces in the public subnets, because SubnetIds is requried for interface endpoints [3]. * I've set PrivateDnsEnabled [3] so the machine-API operator doesn't have to do anything special to get DNS routing it towards the endpoint interfaces. Rolled out to the CI account following 9b39dd2 (Creating private subnets without direct external internet access and updating proxy e2e to use this instead, 2020-07-20, openshift#10355): for REGION in us-east-1 us-east-2 us-west-1 us-west-2 do COUNT=3 if test us-west-1 = "${REGION}" then COUNT=2 fi for INDEX in 1 do NAME="do-not-delete-shared-vpc-blackhole-${INDEX}" aws --region "${REGION}" cloudformation update-stack --stack-name "${NAME}" --template-body "$(cat ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml.md)" --parameters "ParameterKey=AvailabilityZoneCount,ParameterValue=${COUNT}" >/dev/null aws --region "${REGION}" cloudformation wait stack-update-complete --stack-name "${NAME}" SUBNETS="$(aws --region "${REGION}" cloudformation describe-stacks --stack-name "${NAME}" | jq -c '[.Stacks[].Outputs[] | select(.OutputKey | endswith("SubnetIds")).OutputValue | split(",")[]]' | sed "s/\"/'/g")" echo "${REGION}_$((INDEX - 1))) subnets=\"${SUBNETS}\";;" done done We could also have deleted the previous stacks, used 'create-stack' instead of 'update-stack', and used 'stack-create-complete' instead of 'stack-update-complete'. Unsurprisingly, since we were not updating the subnets themselves, the output has not changed: us-east-1_0) subnets="['subnet-0a7491aa76f9b88d7','subnet-0f0b2dcccdcbc7c1d','subnet-0680badf68cbf198c','subnet-02b25dd65f806e41b','subnet-010235a3bff34cf6f','subnet-085c78d8c562b5a51']";; us-east-2_0) subnets="['subnet-0ea117d9499ef624f','subnet-00adc83d4719d4176','subnet-0b9399990fa424d7f','subnet-060d997b25f5bb922','subnet-015f4e65b0ef1b0e1','subnet-02296b47817923bfb']";; us-west-1_0) subnets="['subnet-0d003f08a541855a2','subnet-04007c47f50891b1d','subnet-02cdb70a3a4beb754','subnet-0d813eca318034290']";; us-west-2_0) subnets="['subnet-05d8f8ae35e720611','subnet-0f3f254b13d40e352','subnet-0e23da17ea081d614','subnet-0f380906f83c55df7','subnet-0a2c5167d94c1a5f8','subnet-01375df3b11699b77']";; so no need to update ipi-conf-aws-blackholenetwork-commands.sh. I generated the reaper keep-list following 1b21187 (ci-operator: Fresh AWS shared subnets for us-east-2, etc., 2020-01-30, openshift#6949): for REGION in us-east-1 us-east-2 us-west-1 us-west-2 do for INDEX in 1 do NAME="do-not-delete-shared-vpc-blackhole-${INDEX}" aws --region "${REGION}" resourcegroupstaggingapi get-resources --tag-filters "Key=aws:cloudformation:stack-name,Values=${NAME}" --query 'ResourceTagMappingList[].ResourceARN[]' | jq -r ".[] | . + \" # CI exclusion per DPP-5789, ${REGION} ${NAME}\"" done done | sort and passed that along to the Developer Productivity Platform (DPP) folks so they can update their reaper config. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1769223 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1875773 [3]: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ec2-vpcendpoint.html
a639ef9
to
ae4da13
Compare
@wking it looks like we now are able to correctly see watch that the install completed based on the nightly jobs |
Recent proxy job installed successfully (and then failed in the e2e suite). But that means the changes from this PR worked. /hold cancel Anyone want to drop a |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ewolinetz, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: Updated the following 2 configmaps:
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wking: All pull requests linked via external trackers have merged: Bugzilla bug 1875773 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…oint Like ae4da13 (ci-operator/step-registry/ipi/conf/aws/blackholenetwork/blackhole_vpc_yaml: Add EC2 endpoint, 2020-09-09, openshift#11723), but for load balancers. Docs [1,2]. Rolled out using the same shell commands from the ae4da13 commit message. The machine API provider needs this for something [3] and doesn't respect the Proxy config yet [4]. [1]: https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/load-balancer-vpc-endpoints.html [2]: https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-service.html [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1882855#c1 [4]: https://bugzilla.redhat.com/show_bug.cgi?id=1769223
The machine-API currently ignores the proxy configuration, although future machine-API might grow support for it. That means CI jobs in the blackhole VPC die on i/o timeouts trying to reach
https://ec2.${region}.amazonaws.com/
while provisioning compute machines, and the install subsequently dies because we fail to schedule monitoring, ingress, and other compute-hosted workloads. This commit adds a VPC endpoint to allow EC2 access from inside the cluster. It's similar to the existing S3 VPC endpoint, but:It's an interface type, while S3 needs the older gateway type. This avoids:
while creating the stack.
There are no
RouteTableIds
, because the interface type does not support them. This avoids:while creating the stack.
Rolled out to the CI account following 9b39dd2 (#10355):
We could also have deleted the previous stacks, used
create-stack
instead ofupdate-stack
, and usedstack-create-complete
instead ofstack-update-complete
.Unsurprisingly, since we were not updating the subnets themselves, the output has not changed:
so no need to update
ipi-conf-aws-blackholenetwork-commands.sh
.I generated the reaper keep-list following 1b21187 (#6949):
and passed that along to the Developer Productivity Platform (DPP) folks so they can update their reaper config.